{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数组排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using matplotlib backend: Qt4Agg\n",
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "%pylab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## sort 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先看这个例子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 20.8,  53.4,  61.8,  93.2])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names = array(['bob', 'sue', 'jan', 'ad'])\n",
    "weights = array([20.8, 93.2, 53.4, 61.8])\n",
    "\n",
    "sort(weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`sort` 返回的结果是从小到大排列的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## argsort 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`argsort` 返回从小到大的排列在数组中的索引位置："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 2, 3, 1], dtype=int64)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ordered_indices = argsort(weights)\n",
    "ordered_indices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以用它来进行索引："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 20.8,  53.4,  61.8,  93.2])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weights[ordered_indices]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['bob', 'jan', 'ad', 'sue'], \n",
       "      dtype='|S3')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names[ordered_indices]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用函数并不会改变原来数组的值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 20.8,  93.2,  53.4,  61.8])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## sort 和 argsort 方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数组也支持方法操作："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 2, 3, 1], dtype=int64)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = array([20.8,  93.2,  53.4,  61.8])\n",
    "data.argsort()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`argsort` 方法与 `argsort` 函数的使用没什么区别，也不会改变数组的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 20.8,  93.2,  53.4,  61.8])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但是 `sort`方法会改变数组的值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data.sort()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 20.8,  53.4,  61.8,  93.2])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二维数组排序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于多维数组，sort方法默认沿着最后一维开始排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.2,  0.1,  0.5],\n",
       "       [ 0.4,  0.8,  0.3],\n",
       "       [ 0.9,  0.6,  0.7]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = array([\n",
    "        [.2, .1, .5], \n",
    "        [.4, .8, .3],\n",
    "        [.9, .6, .7]\n",
    "    ])\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于二维数组，默认相当于对每一行进行排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.1,  0.2,  0.5],\n",
       "       [ 0.3,  0.4,  0.8],\n",
       "       [ 0.6,  0.7,  0.9]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sort(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "改变轴，对每一列进行排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.2,  0.1,  0.3],\n",
       "       [ 0.4,  0.6,  0.5],\n",
       "       [ 0.9,  0.8,  0.7]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sort(a, axis = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## searchsorted 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    searchsorted(sorted_array, values)\n",
    "\n",
    "`searchsorted` 接受两个参数，其中，第一个必需是已排序的数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "sorted_array = linspace(0,1,5)\n",
    "values = array([.1,.8,.3,.12,.5,.25])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 4, 2, 1, 2, 1], dtype=int64)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "searchsorted(sorted_array, values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "排序数组：\n",
    "\n",
    "|0|1|2|3|4|\n",
    "|-|-|-|-|-|\n",
    "|0.0|0.25|0.5|0.75|1.0\n",
    "\n",
    "数值：\n",
    "\n",
    "|值|0.1|0.8|0.3|0.12|0.5|0.25|\n",
    "|-|-|-|-|-|-|-|\n",
    "|插入位置|1|4|2|1|2|1|\n",
    "\n",
    "`searchsorted` 返回的值相当于保持第一个数组的排序性质不变，将第二个数组中的值插入第一个数组中的位置：\n",
    "\n",
    "例如 `0.1` 在 [0.0, 0.25) 之间，所以插入时应当放在第一个数组的索引 `1` 处，故第一个返回值为 `1`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from numpy.random import rand\n",
    "data = rand(100)\n",
    "data.sort()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不加括号，默认是元组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.4, 0.6)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bounds = .4, .6\n",
    "bounds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回这两个值对应的插入位置："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "low_idx, high_idx = searchsorted(data, bounds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "利用插入位置，将数组中所有在这两个值之间的值提取出来："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.41122674,  0.4395727 ,  0.45609773,  0.45707137,  0.45772076,\n",
       "        0.46029997,  0.46757401,  0.47525517,  0.4969198 ,  0.53068779,\n",
       "        0.55764166,  0.56288568,  0.56506548,  0.57003042,  0.58035233,\n",
       "        0.59279233,  0.59548555])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[low_idx:high_idx]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
